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Abstract — Gaussian belief propagation (GaBP) is an iterative 
message-passing algorithm for inference in Gaussian graphical 
models. It is known that when GaBP converges it converges to the 
correct MAP estimate of the Gaussian random vector and simple 
sufficient conditions for its convergence have been established. 

In this paper we develop a double-loop algorithm for forcing 
convergence of GaBP. Our method computes the correct MAP 
estimate even in cases where standard GaBP would not have 
converged. We further extend this construction to compute 
least-squares solutions of over-constrained linear systems. We 
believe that our construction has numerous applications, since 
the GaBP algorithm is linked to solution of linear systems of 
equations, which is a fundamental problem in computer science 
and engineering. As a case study, we discuss the linear detection 
problem. We show that using our new construction, we are able 
to force convergence of Montanari's linear detection algorithm, 
in cases where it would originally fail. As a consequence, we 
are able to increase significantly the number of users that can 
transmit concurrently. 

I. Introduction 

The Gaussian belief propagation algorithm is an efficient 
distributed message-passing algorithm for inference over a 
Gaussian graphical model. GaBP is also linked to the canon- 
ical problem of solving systems of linear equations [l]-[3], 
one of the fundamental problems in computer science and 
engineering, which explains the large number of algorithm 
variants and applications. For example, the GaBP algorithm is 
applied for signal processing [3]-[7], multiuser detection [8], 
[9], linear programming [10], ranking in social networks [11], 
support vector machines [12] etc. Furthermore, it was recently 
shown that some existing algorithms are specific instances of 
the GaBP algorithm, including Consensus propagation [13], 
local probability propagation [14], multiuser detection [8], 
Quadratic Min-Sum algorithm [1], Turbo decoding with Gaus- 
sian densities [15] and others. Two general sufficient condi- 
tions for convergence of GaBP in loopy graphs are known: 
diagonal -dominance [16] and walk-summability [17]. See also 
numerous studies in specific settings [1], [8], [13]— [18]. 

In this work, we propose a novel construction that fixes the 
convergence of the GaBP algorithm, for any Gaussian model 
with positive-definite information matrix (inverse covariance 
matrix), even when the currently known sufficient conver- 
gence conditions do not hold. We prove that our construction 
converges to the correct solution. Furthermore, we consider 
how this method may be used to solve for the least-squares 
solution of general linear systems. As a specific application, 



we discuss Montanari's multiuser detection algorithm [8]. By 
using our construction we are able to show convergence in 
practical CDMA settings, where the original algorithm did not 
converge, supporting a significantly higher number of users on 
each cell. 

This paper is organized as follows. Section [TT] outlines 
the problem model. Section [Hi] gives a brief introduction to 
the GaBP algorithm. Section |IV] describes our novel double- 
loop construction for positive definite matrices. Section [V] 
extends the construction for computing least-squares solution 
of general linear systems. We provide experimental results of 
deploying our construction in the linear detection context in 
Section [VI] We conclude in Section I VIII 



II. Problem setting 

We wish to compute the maximum a posteriori (MAP) 
estimate of a random vector x with Gaussian distribution (after 
conditioning on measurements): 



p{x) oc exp{ — ^x T Jx + h T x} . 



(1) 



where J >- is a symmetric positive definite matrix (the 
information matrix) and h is the potential vector. This problem 
is equivalent to solving Jx = h for x given (h, J) or to solve 
the convex quadratic optimization problem: 



minimize f(x) = \x T Jx — h 1 



(2) 



We may assume without loss of generality (by rescaling 
variables) that J is normalized to have unit-diagonal, that is, 
J = I — R with R having zeros along its diagonal. The off- 
diagonal entries of R then correspond to partial correlation 
coefficients [19]. Thus, the fill pattern of R (and J) reflects the 
Markov structure of the Gaussian distribution. That is, p{x) is 
Markov with respect to the graph with edges Q = ^ 
0}. 

If the model J = I — R is walk-summable [17], [18], 
such that the spectral radius of \R\ = (|r.y|) is less than one 
(p(|i?|) < 1), then the method of GaBP may be used to solve 
this problem. We note that the walk-summable condition im- 
plies J — i? is positive definite. An equivalent characterization 
of the walk-summable condition is that / — \R\ is positive 
definite. 
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TABLE I 

Computing x* = argmax^ exp(-ix T Jx + h T x) viaGaBP. 



III. Gaussian belief propagation 

The Gaussian belief propagation algorithm is an efficient 
distributed message-passing algorithm for inference over a 
Gaussian graphical model. Given the Gaussian density func- 
tion (Q~|i or objective function (O, we are interested in com- 
puting the MAP assignment: 

x* = argmaxp(a;) = argmin/(x) 

X X 

The density p(x) specifies a graphical model with re- 
spect to the graph G of the inverse covariance matrix 
J, with edge potentials ('compatibility functions') ipij and 
self -potentials ('evidence') ipi. These graph potentials pro- 
vide a pairwise factorization of the Gaussian distribution 
p(x) oc H" =1 ipt(xi)l\ {i j- }eG ^i 3 (x l ,x : j), with xp ij (x i ,x j ) = 
exp(—XiJijXj), and ^(x,) = cxp ( - \Jux\ + h x Xi). Then, 
we would like to calculate the marginal densities, which must 
also be Gaussian, 

where p>i and Ki are the marginal mean and variance, respec- 
tively. The GaBP update rules are summarized in Table Q] We 
write N(i) to denote the set of neighbors of node i in G. 

It is known that if GaBP converges, it results in the 
exact MAP estimate x*, although the variance estimates Ki 
computed by GaBP are only approximations to the correct 
variances [16]. The walk-summable condition guarantees that 
GaBP converges [17], generalizing the stricter condition [16] 
that J is diagonally dominant (i.e., \Ju\ > Si^i l>^)- 
An upper bound on convergence speed is given in [10]. 

IV. Our construction 

This current paper presents a method to solve non- 
walksummable models, where J = / — R is positive definite 
but p{\R\) > 1, using GaBP. There are two key ideas: (1) using 
diagonal loading to create a perturbed model J' = J+T which 
is walk-summable (such that the GaBP may be used to solve 
J'x = h for any h) and (2) using this perturbed model J' 



and convergent GaBP algorithm as a preconditioner in a sim- 
ple iterative method to solve the original non-walksummable 
model. 

A. Diagonal Loading 

We may always obtain a walk-summable model by diagonal 
loading. This is useful as we can then solve a related system 
of equations efficiently using Gaussian belief propagation. For 
example, given a non-walk-summable model J = I — R we 
obtain a related walk-summable model J 7 = J + jl that is 
walk-summable for large enough values of 7: 

Lemma 1: Let J = I- R and J 1 = J + 7/ = (l+j)I-R. 
Let 7 > 7* where 

j*=p(\R\)-l. (3) 

Then, J' is walk-summable and GaBP based on J' converges. 

Proof. We normalize J' = (1 + 7)/ — R to obtain Jn orm = 
I — R' with R' = (1 + r y)~ 1 R, which is walk-summable if 
and only if p(\R'\) < 1. Using p(\R'\) = (1 + j)~ 1 p(\R\) we 
obtain the condition (1 + j)~ 1 p(\R\) < 1, which is equivalent 
to 7 > p{\R\) - l.o 

It is also possible to achieve the same effect by adding a 
general diagonal matrix V to obtain a walk-summable model. 
For example, for all T > T* where 7^ = J a — Y^j^i l^yl 
it holds that J + T is diagonally-dominant and hence walk- 
summable (see [17]). More generally, we could allow T to be 
any symmetric positive-definite matrix satisfying the condition 
I + r >- \R\. However, only the case of diagonal matrices is 
explored in this present paper. 

B. Iterative Correction Method 

Now we may use the diagonally-loaded model J' = J + T 
to solve Jx — h for any value of V > 0. The basic idea 
here is to use the diagonally-loaded matrix J' = J + T as 
a preconditioner for solving the Jx = h using the iterative 
method: 

= (J + T)- 1 (h + Tx^) (4) 

Note that the effect of adding positive T is to reduce the size 
of the scaling factor (J + but we compensate for this 
damping effect by adding a feedback term Fx to the input h. 
Each step of this iterative method may also be interpreted as 
solving the following convex quadratic optimization problem 
based on the objective f(x) from (01: 

i (t+1) =argmin{/(x) + i(x-x ( ' ) ) T r(a;-x( t ))} (5) 

This is basically a regularized version of Newton's method to 
minimize f(x) where we regularize the step-size at each iter- 
ation. Typically, this regularization is used to ensure positive- 
definiteness of the Hessian matrix when Newton's method is 
used to optimize a non-convex function. We instead use it to 
ensure that J + V is walk-summable, so that the update step 
can be computed via Gaussian belief propagation. Intuitively, 
this will always move us closer to the correct solution, but 
slowly if r is large. It is simple to demonstrate the following: 



Lemma 2: Let J y and T ^ 0. Then, denned by © 
converges to i* = J _1 /i for all initializations x^ '. 

Comment. The proof is given for a general (non-diagonal) 
r y 0. For diagonal matrices, this is equivalent to requiring 
Tu > for i = 1, . . . , n. 

Proof. First, we note that there is only one possible fixed- 
point of the algorithm and this is x* = J^ 1 h. Suppose x is 
a fixed point: x = ( J + r) _1 (/i + Fx). Hence, (J + T)x = 
h + Fx and Jx = h. For non-singular J, we must then have 
x = J~ x h. Next, we show that the method converges. Let 



-,(/) 



x* denote the error of the fc-th estimate. The 



error dynamics are then e' t+1 ' = ( J+T)^ 1 Te^ t \ Thus, = 
((J + T)~ 1 r) k eS°) and the error converges to zero if and only 
if p((J + r)" 1 ! 1 ) < 1, or equivalently p(H) < 1 where H = 
(J + r)~ 1//2 r( J + T) -1 / 2 y is a symmetric positive semi- 
definite matrix. Thus, the eigenvalues of H are non-negative 
and we must show that they are less than one. It is simple to 
check that if A is an eigenvalue of H then is an eigenvalue 
of r x /2j-irV2 y o. This is seen as follows: Hx = \x, 
{J+T)- X r y = Xy (y = (J + r)- 1 / 2 ^), Ty = X(J + T)y, (1- 
X)Ty = XJy, J^Ty = j^y and T^J^T^z = j^z 
(z = T x / 2 y) [note that A / 1, otherwise Jy = contradicting 
J y 0]. Therefore > and < A < 1. Then p(H) < 1, 
e (t) o and fw — > x* completing the proof, o 

Now, provided we also require that J' = J + V is walk- 
summable, we may compute x^ t+1 ^ = (J + r) _1 /i(* +1 ), 
where M t+1 ) = /i + Tx^\ by performing Gaussian belief 
propagation to solve J'x^ t+1 ^ = h^ t+1 \ Thus, we obtain a 
double-loop method to solve Jx = h. The inner-loop performs 
GaBP and the outer-loop computes the next hS 1 '. The overall 
procedure converges provided the number of iterations of 
GaBP in the inner-loop is made large enough to ensure a good 
solution to J'x^ t+1 ^ — h,( t+1 \ Alternatively, we may compress 
this double-loop procedure into a single-loop procedure by 
preforming just one iteration of GaBP message-passing per 
iteration of the outer loop. Then it may become necessary 
to use the following damped update of h^' with step size 
parameter s 6 (0, 1): 



h {t+1) = (1 - s)h® + s(h + Tx W) 



/i + r((l-s)i ( *- 1) + sx (t >) 



(6) 



This single-loop method converges for sufficiently small val- 
ues of s. In practice, we have found good convergence with 
s = 5. This single-loop method can be more efficient than the 
double-loop method. 



V. Extension to General Linear Systems 

In this section, we efficiently extend the applicability of the 
proposed double-loop construction for a general linear system 
of equations (possibly over-constrained.) Given a full column 
rank matrix J 6 M nx,i: , n > k, and a shift vector h, we are 
interested in solving the least squares problem min x || Jx — 
/ill?,. The naive approach for using GaBP would be to take the 
information matrix J = (J T J), and the shift vector h = J T h. 



Note that J is positive definite and we can use GaBP to solve 
it. The MAP solution is 



J- 1 h={J T J)- 1 Jh, 



(7) 



which is the pseudo-inverse solution. 

Note, that the above construction has two drawbacks: first, 
we need to explicitly compute J and h, and second, J may not 
be sparse in case the original matrix J is sparse. To overcome 
this problem, following [9], we construct a new symmetric data 
matrix J based on the arbitrary rectangular matrix J € W nxk 

J T 

On X n 



J 



Ikxk 
J 



n(fe+n) x (fe+n) 



Additionally, we define a new hidden variable vector x = 
{x T ,z T } T G where x G R k is the solution vector 

and z § K ra is an auxiliary hidden vector, and a new shift 
vector h 4 {o£ xl , h T } T G R( k+n l 

Lemma 3: Solving x = J^h and taking the first k entries 
is identical to solving Eq. [7] 
Proof. Is given in [9]. 

For applying our double-loop construction on the new 
system (h, J) to obtain the solution to Eq. (O, we need to 
confirm that the matrix J is positive definite. (See lemma 2). 
To this end, we add a diagonal weighting —7/ to the lower 
right block: 



Ikxk J 



o(k+n) x (fe+n) 



Then we rescale J to make it unit diagonal (to deal with 
the negative sign of the lower right block we use a complex 
Gaussian notation as done in [8]). It is clear for a large enough 
7 we are left with a walk-summable model, where the rescaled 
J is a hermitian positive definite matrix and p(\J — I\) < 1, 
Now it is possible to use the double-loop technique to compute 
Eq. [7] Note that adding —7/ to the lower right block of J is 
equivalent to adding 7/ into Eq. 7: 



x = {J 1 J + 1 I)- 1 J T h 



(8) 



where 7 can be interpreted as a regularization parameter. 

VI. Experimental results 

A. Linear detection in linear channels 

Consider a discrete-time channel with a real input vec- 
tor x = {xi, . . . , xk } t governed by an arbitrary prior 
distribution, P x , and a corresponding real output vector 
y = {yi, . . . ,yn} T = f{x T } G R K . Here, the function /{■} 
denotes the channel transformation. By definition, linear de- 
tection compels the decision rule to be 



A{z*} = A{A~ 1 b} , 



(9) 



where b = y is the K x 1 observation vector and the 
K x K matrix A is a positive-definite symmetric matrix 
approximating the channel transformation. The vector x* is 
the solution (over M) to Ax = b. Estimation is completed 
by adjusting the (inverse) matrix-vector product to the input 



alphabet, dictated by P x , accomplished by using a proper 
clipping function A{ } {e.g., for binary signaling A{-} is the 
sign function). 

For example, linear channels, which appear extensively in 
many applications in communication and data storage systems, 
are characterized by the linear relation 

V = f{ x ) =Cx + n, 

where n is a K x 1 additive noise vector and C = S T S 
is a positive-definite symmetric matrix, often known as the 
correlation matrix. The N xK matrix S describes the physical 
channel medium while the vector y corresponds to the output 
of a bank of filters matched to the physical channel S. 

Assuming linear channels with AWGN with variance a 1 
as the ambient noise, the linear minimum mean-square error 
(MMSE) detector can be described by using A = C + ct 2 Ik, 
known to be optimal when the input distribution P x is Gaus- 
sian. In general, linear detection is suboptimal because of its 
deterministic underlying mechanism (i.e. , solving a given set 
of linear equations), in contrast to other estimation schemes, 
such as MAP or maximum likelihood, that emerge from an 
optimization criteria. 

B. Montanari's iterative algorithm for computing the MMSE 
detector 

Recent work by Montanari et al. [8] introduces an efficient 
iterative algorithm for computing the MMSE detector. Follow- 
ing this work, Bickson et al. showed that this algorithm is an 
instance of the GaBP algorithm [9]. 

In the current work, we apply our novel technique for 
forcing the convergence of Montanari's algorithm. To remind, 
Montanari's algorithm computes the MMSE solution 

x = (C + a 2 I K )- 1 y. 

We use the following setting: given a random-spreading 
CDMA code with chip sequence length n = 256, and k = 64 
users. We assume a diagonal AWGN with a 2 = 1. Matlab 
code of our implementation is available on [20]. 

Using the above settings, we have drawn at random random- 
spreading CDMA matrix. Typically, the sufficient convergence 
conditions for the GaBP algorithm do not hold. For example, 
we have drawn at random a randomly-spread CDMA matrix 
with p(\I K - C N \) = 4.24, where C N is a diagonally- 
normalized version of (C + u 2 1k)- Since p(\Ix — C N \) > 1, 
the GaBP algorithm for multiuser detection is not guaranteed 
to converge. 

Figure Q] shows that under the above settings, the GaBP 
algorithm indeed diverged. The x-axis represent iteration num- 
ber, while the values of different Xi are plotted using different 
colors. This figure depicts well the fluctuating divergence 
behavior. 

Next, we deployed our proposed construction and used a 
diagonal loading to force convergence. Figure [2] shows two 
different possible diagonal loadings. The x-axis shows the 
Newton step number, while the y-axis shows the residual. We 
experimented with two options of diagonal loading. In the 
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Fig. 1. Divergence of the GaBP algorithm for the multiuser detection 
problem, when n = 256, k = 64. 

first, we forced the matrix to be diagonally-dominant (DD). 
In this case, the spectral radius p = 0.188. In the second case, 
the matrix was not DD, but the spectral radius was p = 0.388. 
Clearly, the Newton method converges faster when the spectral 
radius is larger. In both cases the inner iterations converged in 
five steps to an accuracy of 10 -6 . 

Convergence of fixed GaBP iteration with n=256,k=64 
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Fig. 2. Convergence of the fixed GaBP iteration under the same settings 
(n = 256, k = 64) 

The tradeoff between the amount of diagonal weighting to 
the total convergence speed is shown in Figures 3,4. A CDMA 
multiuser detection problem is shown (k = 128, n = 256). 
Convergence threshold for the inner and outer loops where 
10 -6 and 10~ 3 . The x-axis present the amount of diagonal 
weighting normalized such that 1 is a diagonally-dominant 
matrix, y-axis represent the number of iterations. As expected, 
the outer loop number of iterations until convergence grows 
with the increase of 7. In contrast, the average number of inner 
loop iterations per Newton step (Figure 4) tends to decrease 
as 7 increases. The total number of iterations (inner x outer) 
represents the tradeoff between the inner and outer iterations 
and has a clear global minima. 

VII. Conclusions and Future Work 

We have presented an iterative method based on Gaussian 
belief propagation which always converges to the correct 
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Fig. 3. Effect of diagonal weighting on outer loop convergence speed. 
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Fig. 4. Effect of diagonal weighting on inner loop convergence speed. 



global solution, even in models where Gaussian belief prop- 
agation alone does not converge. Essentially, this involves 
adding a diagonal-loading term to force the model to become 
walk-summable such that GaBP converges in this modified 
model and adding a feedback mechanism that corrects the 
damping caused by the diagonal-loading term. 

We believe that there are numerous applications for our 
construction in many domains, since GaBP is related to the 
solution of linear systems of equations. As an example, we 
discuss the case of multiuser detection. We gave a concrete 
example, where a state-of-the-art linear iterative algorithm for 
detection fails to converge. Using our construction we are 
able to force convergence for computing the correct MMSE 
detector. 

There are a number of directions for further development. 
Most importantly, it would be very useful to develop a simple 
method to select T so as to optimize the rate of convergence 
of the overall method. In the double-loop method, it is seen 
that there is a trade-off in deciding how large T should 
be. For larger T (beyond the threshold of walk-summability) 
GaBP converges faster by accelerating the inner-loop of our 
algorithm. However, larger T will also make the outer-loop 
converge more slowly. Hence, we must somehow balance these 
competing objectives in choosing T. In the single-loop method, 
it would be useful to develop an adaptive method to optimize 
the step-size parameter s. Lastly, it may also prove useful 



to exploit a more general class of perturbations beyond the 
diagonal-loading method used in this paper. 
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